Python
Machine Learning
Regression
Cost Estimation
Streamlit
Project Overview
Healthcare costs are a significant concern for individuals and insurance companies. This project focuses on predicting medical insurance charges based on various factors such as age, BMI, smoking habits, and region.
Initially, linear regression was tested but proved insufficient in capturing complex relationships. Polynomial regression was then applied, improving accuracy by accounting for nonlinear dependencies.
The model helps individuals plan for expenses and assists insurance providers in pricing policies effectively.
Key Insights
- Smoking significantly increases insurance costs, emphasizing its impact on healthcare expenses.
- Higher BMI leads to increased charges, reinforcing the importance of preventive healthcare.
- Individuals from the Southeast region pay the highest premiums, suggesting regional disparities in healthcare costs.
- Families with 2-3 children face higher insurance charges, highlighting family size as a cost factor.
- Young adults (20-25) tend to pay higher charges, possibly due to lifestyle factors or lack of preventive care access.
Technical Implementation
Data Preprocessing:
- Handled missing values and performed feature engineering.
- Encoded categorical variables such as smoking status and region.
Feature Selection & Transformation:
- Explored correlation between features and insurance charges.
- Applied polynomial transformation to capture nonlinear dependencies.
Model Selection:
- Tested Linear Regression initially but found limitations in capturing complexities.
- Switched to Polynomial Regression, improving model performance.
Model Evaluation:
- Achieved an R-squared score of 0.81, indicating strong predictive power.
- Assessed model accuracy using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).
Key Learnings
- Nonlinear relationships in data require advanced techniques like polynomial regression.
- Feature engineering is crucial for improving model performance.
- Lifestyle choices (e.g., smoking, obesity) play a significant role in healthcare costs.
- Understanding regional healthcare cost variations can help insurance companies tailor pricing strategies.